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We introduce an approach to partitioning networks into communities that not only determines 
the best community structure, but also provides a range of characterization techniques to assess how 
significant that structure is. We study the thermodynamics of community structure by producing 
equilibrium ensembles of partitions, in which each partition is represented with a well-defined statis- 
tical weight. Thus we are able to study the temperature dependence of thermodynamic properties, 
namely the modularity Q and heat capacity, with particular emphasis on the transition between 
high-temperature, essentially random partitions and low-temperature partitions with high modu- 
larity. We also look at frequency matrices that measure the likelihood that two nodes belong to the 
same community, and introduce an order parameter to measure the 'blockiness' of the frequency 
matrix, and therefore the uniqueness of the community structure. These methods have been applied 
to a number of model networks in order to understand the effects of the degree distribution, spatial 
embedding and randomization. Finally, we apply these methods to a metabolic network known to 
have strong community structure and find hierarchical community structure, with some communities 
being more robust than others. 



I. INTRODUCTION 

Many networks do not have a homogeneous topology, 
but are composed of interacting modules of nodes with 
a high density of edges within the modules. For exam- 
ple, in the world wide web, web sites covering similar 
topics group together into modules In social net- 
works, communities naturally form, where people having 
certain characteristics in common are more likely to be 
acquainted. Examples include communities of scientists 
working on similar areas of research |2|,|3(, jazz musicians 
grouped by race 4], departments in an organization Q 
and authors of home pages who have some common inter- 
ests Many non-social networks also have underlying 
community structure. For example ^biological networks 
often consist of functional modules Q, H, H, EU . 

These modules can be determined purely from the 
topology of the network, removing human bias. Knowl- 
edge of the modules can then lead to further understand- 
ing of a network. For example, nodes in the same com- 
munity usually have some properties in common, facili- 
tating classification of nodes. This can be used to find 
similar web pages 1], aiding search engines and content 
filtering. In biological networks, it can be used to find 
metabolites, proteins or genes performing similar func- 
tions, and to understand the role of different elements 

HiSEIillillillII. 

Processes occurring on a network are affected by its 
modularity, for example communication patterns Hq 
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or web searching ^3 • Trends spread well within commu- 
nities fl|Lbut diffusion between different communities is 
slow |l9t 12(1 l2~j| . The amount of time an epidemic lasts 
on a network has been shown to be a function of the 
number of modules [2^. Furthermore, although many 
networks have short paths, including random networks, 
the short paths can be very difficult to find. Commu- 
nities in a social network assist in finding short paths 
[23I [2^ . Modularity can be also used to simplify the 
amount of data in large complex networks, for example, 
through coarse-graining 

H ES El HI or through char- 
acterization of networks by studying the occurrence of 
motifs 

For networks with strong community structure, a key 
question is how significant or real is that community 
structure. Most studies on community structure involve 
finding the 'best' partition of nodes into communities 
[271 l2q , where best is usually defined in terms of a quan- 
tity called the modularity y, |29| . Occasionally compar- 
ison to random networks with the same degree distri- 
bution is made to try to assess the significance of any 
observed modularity [lj, |3(J, |3l| . However, by studying 
just the one best partition, with each node assigned to 
one community, a lot of useful information is lost. Re- 
cently it has become clear that many networks are not 
simply composed of distinct, weakly interacting modules 
|32|. For example, the modules could be 'fuzzy', in the 
sense that some nodes belong to more than one commu- 
nity, as in social networks where people belon g to many 
overlapping communities [H EH E3 155 l36||. The 
modules could also join together, forming larger modules 
in a hierarchical manner 0, HI |Hl |U Eg, E3 . 

A further consideration is the robustness or unique- 
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ness of a given partition. Robustness looks for not just 
the best community structure, but whether it is well de- 
fined. In some cases, there can be many partitions with 
similarly high modularity, but with a very different par- 
titioning of nodes into communities. This indicates that 
the community structure is probably not particularly in- 
formative, an extreme example being lattices. The net- 
works based on lattices are of course very uniform, with 
identical environments around each lattice point. How- 
ever, applying a community detection algorithm to a lat- 
tice uncovers partitions with high modularity piol . l3lj . 
If just judged in comparison to a randomized graph, it 
would have been concluded that this community struc- 
ture is significant. However, any partition of a lattice is 
non-unique, since translation of the community bound- 
aries by any lattice vector will generate a new partition 
with identical modularity. Although the lack of unique- 
ness in the example of a lattice is transparent, it would 
be useful to have a general method to test for the lack of 
robustness in obtained community structures. 

In this work, we study equilibrium ensembles of com- 
munities at different temperatures. Previous works have 
begun to look at ensembles of community structures 
@,M E3 H3, S • However, the ensembles have usu- 
ally been generated in an ad hoc manner, for example 
by introducing a stochastic element into the algorithms 
or by comparing the partitions obtained at the end of a 
number of runs of the algorithm. Here, the ensembles are 
sampled in a statistically rigorous way, with each parti- 
tion in the ensemble having a well-defined weight. 

The use of temperature is also implicit to some meth- 
ods for optimizing community structure, with better par- 
titions found at lower temperatures [HI El El El 
E3 |. Here, we rephrase the question of how unique is the 
obtained community structure as how different are the 
partitions obtained at a given temperature. These dif- 
ferences can be visualised using a matrix showing how 
many times each pair of nodes are classified as being in 
the same community |10l |3J, |42| . Uniqueness, hierarchi- 
cal and fuzzy community structures can be seen clearly. 

In Section [Hj we first introduce the methods used and 
the properties we measure. Then in Section lTTTl we study 
computer-generated test networks with prescribed com- 
munity structure, introduced by Girvan and Newman 
and often used to test algorithms 0, 0, El . To illustrate 
the issue of uniqueness of community structure, a hexag- 
onal lattice is studied in Section HVl We then apply our 
approach to scale-free networks in order to determine the 
effect of the degree distribution. The Apollonian network 
(Section El is an example of a spatial, scale- free network 
[lEl E3 | . It has some community structure , grouping 
together nodes that are close in space as for the hexag- 
onal lattice, but with different length scales potentially 
leading to a hierarchical structure. In Section IVIl we 
study scale-free test networks with introduced commu- 
nity structure similar to the original test networks, the 
difference being that the original test networks have an 
Erdos-Renyi degree distribution. Finally, in Section IVTTI 



we apply the algorithm to a metabolic network describ- 
ing the interactions between metabolites in E. coli [i^ . 
Community structure has been uncovered in this net- 
work previously [?J[l(3, EIE1 However, biological net- 
works often have complex, hierarchical community struc- 
ture rather than isolated modules @,EiEl- As such, 
our approach is particularly appropriate. 

II. METHODS 

To find good partitions the modularity is often used 

0, El , and is defined by 

Q = J2 (e c /M- (j>/ 2M ) j • 

This quantity gives a measure of how many more edges 
are within communities compared to that expected for 
a random network. M is the total number of edges in 
the network, and k{ is the degree of node i. e c measures 
the number of edges lying within community c, i.e. the 
nodes at either end of these edges are both classified as 
belonging to community c for this partition. ^2 iec ki/2M 
is the fraction of all ends of edges that arrive at commu- 
nity c. If an edge is chosen and followed at random, it 
gives the probability that the node at the end belongs to 
community c. Hence, (^2 iec fe/2M) gives the predicted 
fraction of edges within community c. 

To characterize the possible partitioning of the network 
into communities, we generate a canonical ensemble of 
network partitions, with — Q playing the role of energy. 

1. e. at temperature T, the statistical weight of a given par- 
tition in the ensemble is proportional to exp(Q/T). Each 
partition is described by a vector assigning each node to a 
community, similar to the coordinate vector describing a 
configuration. The ensemble is simulated using Metropo- 
lis Monte Carlo. Each move involves changing the com- 
munity assigned to a node. If the move leads to a better 
community structure, i.e. an increase in Q, the move is 
accepted, otherwise it is accepted with a temperature- 
dependent probability exp(AQ/T), where AQ is the dif- 
ference in Q between the new and old partitions. 

For a network with N nodes and n c communities, a 
node is chosen with probability 1/N and assigned to ei- 
ther one of the existing communities or to a new commu- 
nity, each with probability l/(n c + l). Because the aim is 
to generate equilibrium ensembles of partitions, detailed 
balance must be maintained, i.e. the probability of at- 
tempting a move (not necessarily accepting it) must be 
equal to the probability of attempting the reverse move. 
If the selected node is in a community of size one, i.e. by 
itself, a move will either leave the structure unchanged, or 
decrease the number of communities. Therefore, a com- 
munity must be chosen with probability l/n c , excluding 
the option of starting a new community, which in this 
case is equivalent to choosing the community to which 
the node already belongs. 
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To facilitate the generation of equilibrium ensembles 
even at low temperature, we use parallel tempering |5(l |. 
In this method, Monte Carlo simulations are simultane- 
ously run at a set of different temperatures, but where 
additional exchange moves are attempted that involve 
swapping the partitions between two temperatures. Such 
moves are always accepted if the higher temperature had 
a higher Q value, otherwise it is accepted with probabil- 
ity exp(— A/?AQ), where AQ is now the difference in Q 
between the partitions at the two temperatures and A/5 
is the difference in the inverse temperature (3. 

The initial state of each ensemble has each node in a 
separate community. A number of steps are then required 
to equilibrate the system. At the lowest temperature, Q 
tends to a single, maximum value, Qmax- We call the 
corresponding partition the global maximum. Although 
without testing all of the possible partitions it is impos- 
sible to be certain there is not a partition with higher Q 
[5l| . the use of the parallel tempering method makes it 
likely that the true global maximum has been found for 
the network sizes used here. 

Various properties can be calculated for these ensem- 
bles. We study the variation of Q with temperature, 
seeing an increase towards an optimal value at low tem- 
peratures. The heat capacity, C = —dQ/dT is used to 
characterize the nature of these changes with a peak in 
the heat capacity representing a transition. The more 
rapidly that Q changes, the sharper the peak will be. We 
define the transition temperature, T c , as the temperature 
at which this peak occurs. 

We also introduce an order parameter to measure the 
similarity of the sampled partitions at a given temper- 
ature, reflecting the uniqueness of the underlying com- 
munity structure. We look at frequency matrices, mea- 
suring the relative number of times each pair of nodes is 
classified as belonging to the same community. If only 
one community structure is observed, as is generally the 
case at sufficiently low temperatures, the same pairs of 
nodes are assigned to the same community in each parti- 
tion, giving a block diagonal matrix. At infinite temper- 
atures, each partition has equivalent weight and any pair 
of nodes is equally likely to be together, giving a homoge- 
neous matrix. At intermediate temperatures, there may 
be some remnants of community structure, with some 
pairs of nodes being assigned to the same community 
relatively often, showing as darker areas in the frequency 
matrix. 

This uniqueness can be quantified using the Fiedler 
eigenvalue A2, [l?! [H^ i.e. the magnitude of the second 
smallest eigenvalue (the smallest being zero) of the Lapla- 
cian. The elements of the Laplacian matrix are the neg- 
ative of the elements of the frequency matrix, except for 
the diagonals, which are sums over the corresponding 
row. For comparisons between different temperatures, 
the frequency matrix is first rescaled by the sum of all 
the elements, which is larger when there are fewer, larger 
communities at lower temperatures. A2 gives a measure 
of how 'diagonal' the matrix is, and takes values of zero 



for a block diagonal matrix and one for a homogeneous 
matrix. It therefore gives a measure of how different the 
partitions at each temperature are and can be used as 
an order parameter, since it is close to one at high tem- 
peratures and decreases at low temperatures. If the best 
community structure, seen at low temperatures, is de- 
generate, i.e. there are competing partitions with equal 
values of Q, then A2 will remain above zero as the tem- 
perature decreases to zero. Even if A2 does go to zero, 
it provides insight into the character of the transition. 
A sharp transition indicates that there are few compet- 
ing structures with high Q, whereas a broader transition 
implies the opposite. 

The eigenvector corresponding to the Fiedler eigen- 
value gives the best order in which to place the nodes 
to make the frequency matrix look maximally 'blocky', 
and assists with assigning the nodes to communities. In- 
terestingly, the Fiedler eigenvalue is also used in spectral 
partitioning to find communities [23, |53| , the difference 
being that spectral partitioning sorts the adjacency ma- 
trix into its most blocky form, whereas here we apply it 
to the frequency matrix. It has also been recently used 
to find communities by maximising Q 0, • 



III. TEST NETWORKS 

The test networks used consist of 128 nodes, each des- 
ignated as belonging to one of four communities of 32 
nodes each 3j. 1024 edges are added. Of these, 1024 P^ n 
connect nodes belonging to the same community and 
the remaining 1024(1 — P^ n ) connect nodes in different 
communities. P^ n is varied between 0.25 and 1. When 
Pin = 1, nodes are only connected to other nodes in the 
same community, giving four components corresponding 
to the four communities. When P^ n = 0.25, one quar- 
ter of the edges leading from a node are expected to be 
connected to nodes in the same community. Based on 
the sizes of the communities, this is exactly as would be 
expected if the nodes were connected at random. There- 
fore, when Pi n = 0.25 the network is essentially random, 
i.e. an Erdos-Renyi graph. 

Pin = 0.5 is an interesting case in that a node is as 
likely to be connected to nodes in its own community as 
it is to nodes in other communities. Indeed, we do see 
some differences between P in < 0.5 and Pi n > 0.5. As- 
suming that all nodes have the average degree, it can then 
be shown that the modularity for the input community 
structure would be given by Qi npu t = Pin — 0.25. Fig. 
shows that Qmax follows this expression for P in > 0.5. 
However, for P^ n < 0.5, we find a global maximum (GM) 
that is better (has higher Q) than the input community 
structure. This is because random networks often have 
partitions with reasonably high modularity as fluctua- 
tions in the link distributions are bound to give some 
partitions with Q significantly above 30] . The value of 
Qmax for P^ < 0.5 has been predicted to be Q max ~ 0.21 
by Guimera et al. [3(j and Qmax ~ 0.23 by Reichardt and 
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FIG. 1: (Colour online) The highest observed value of Q, 
Qmax, against P in for the test networks. For P in > 0.5, Qmax 
increases with Pin aS Pin — 0.25 (also shown). 



Bornholdt [23 • Both are in line with the result obtained 
here (Q ma x ~ 0.24), which is for just one realization of 
the possible networks that have Pi n = 0.25. 

For these networks, there is a clear transition as a func- 
tion of temperature between a high Q, low entropy phase 
at low temperature, and a low Q, high entropy phase at 
high temperature, as can be seen in Fig. O This tran- 
sition to stronger community structure at low tempera- 
ture, which is indicated by a peak in the heat capacity in 
Fig.[2{b), is sharper for high P ini and becomes similar to 
a first-order phase transition. This feature implies coop- 
er at ivity, i.e. on heating, different communities break up 
at the same temperature. This is because the community 
structure in these test networks is very homogeneous: all 
nodes have roughly the same degree, and the input com- 
munities all have the same size and strength. 

By contrast, for P^ n < 0.5, the GMs consist of different 
sized communities, and because of these heterogeneities 
the transition is much broader, with some communities 
breaking up before others. As the GM, Qmax and the 
number of edges within communities is similar for each 
of these networks, the position and form of the peak be- 
comes approximately independent of P^ n for P in < 0.5. 

For Pi n > 0.5, the transition occurs at higher tempera- 
tures for higher P^ n as the communities are more difficult 
to break up on heating, and easier to find on cooling. The 
'energy cost' (decrease in Q) to break up a community 
depends on how cohesive that community is, and by de- 
sign is larger for higher P^ n , leading to a higher transition 
temperature. 

Distributions of Q at different temperatures are shown 
for P in = 0.25 and 1 in Fig. 01 At high P in the dis- 
tributions in the transition region are bimodal, which is 
a characteristic of a first-order phase transition. As few 
partitions are observed with intermediate values of Q, the 
system switches rapidly between high and low Q parti- 
tions, leading to a sharper transition. By contrast, at 
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FIG. 2: (Colour online) Thermodynamic properties for test 
networks. Variation of (a) Q, (b) heat capacity C, (c) A2 
and (d) average community size with temperature. In (d), 
the community size of the input community structure, which 
is 32, is also shown. Equally spaced values of Pi n have been 
used, with every other line labelled by its Pi n value. Note 
that in (b) the heat capacity has been plotted on a log scale, 
unlike in the following figures. 
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FIG. 3: (Colour online) Distributions of Q seen at different 
temperatures for test networks with (a) Pi n — 1 and (b) Pi n = 
0.25. T c is the transition temperature indicated by a peak in 
the heat capacity. 



lower P in the distributions are always unimodal, and the 
modal value smoothly changes across the transition. 

Possible alternative community structures to the GM 
would involve either some nodes swapping between dif- 
ferent communities or groups of nodes breaking away 
from the main communities. Both of these are harder 
for higher P^ n , where there are more links to the main 
community and fewer links to nodes from other communi- 
ties. Therefore, at Pi n = 1 the only alternative seen with 
any statistical significance are thermal fluctuations about 
the GM. As Pi n decreases, the preferred community of a 
node becomes less well-defined as it has more connections 
to other communities. Consequently, some frustration is 
introduced, thus making alternative partitions more rea- 
sonable, i.e. they have higher modularity and are sampled 
at intermediate temperatures. This leads to the transi- 
tion becoming broader as P{ n decreases. 

These changes are also reflected in the temperature 
dependence of the average community size (Fig. Efd)). 
For P in = 1, at T c there is a sharp jump to larger com- 
munities, although only to approximately 23 nodes per 
community rather than 32 as in the GM. A typical par- 
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FIG. 4: Frequency matrices at four different temperatures 
for test networks with ( 

Pin — 

1, (b) P %n = 0.25. 



tition at T c consists of four large communities similar to 
those in the GM, with a few very small 'communities' of 
one or two nodes alongside. There is a heavily preferred 
scale to the community structure, and very different sizes 
are hardly ever seen. As the temperature decreases, these 
fluctuations die away and the partitions settle to the GM, 
where all communities have size 32. 

For lower P in , the jump at T c becomes less pronounced, 
and immediately below T c the average community size is 
smaller. The GM is less dominant because nodes have 
more edges to nodes outside their prescribed community, 
so they are less strongly attached to that community. 
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FIG. 5: (Colour online) Best community structure found for 
the hexagonal lattice, having Q = 0.6520. Each node has de- 
gree 6. Periodic boundary conditions are used, and a single 
cell containing 225 nodes is illustrated. The solid lines repre- 
sent boundaries between communities, and the dashed lines 
the boundaries of the cell. 



Rather than fluctuations about the GM just occurring, 
communities with a range of sizes are observed. 

Frequency matrices for ensembles at selected temper- 
atures are shown in Fig. At the lowest temperature, 
the matrices are block diagonal, because all partitions in 
the ensemble have the same community structure. Each 
block corresponds to one community. As the tempera- 
ture is increased, the ensembles consist of partitions with 
different community structure and the matrices become 
more homogeneous, with some clustering around the di- 
agonal. At the highest temperature, the partitions are es- 
sentially random and all pairs of nodes are equally likely 
to be together, giving a very homogeneous matrix. 

For Pi n > 0.5 the GM is the four input communities 
of equal size. On increasing the temperature the parti- 
tions remain similar up to T c . Traces of the underlying 
community structure are even clear at high temperatures 
when only small communities are present, reflecting the 
preference for nodes to mix with nodes from the same 
predefined communities. For P^ n < 0.5 the GM has six 
communities of different sizes. For P^ n = 0.25, some 
nodes show only a weak preference for their own com- 
munity, and at low temperature can switch between two 
or three different communities, giving rise to the strips 
evident in Fig. Efb), which are a signature of overlap- 
ping community structure. There are also some small 
communities that have broken away from the main com- 
munities. Although at T c Q has only decreased by 35% 
from its maximum value, the original community struc- 
ture has virtually disappeared to be replaced by an en- 
semble of very different partitions, thus giving a fairly 
homogeneous frequency matrix. 

The variation from homogeneous to block diagonal in 
the frequency matrices is reflected in A2, which is close 
to one at high temperatures and approaches zero at low 
temperatures, as shown in Fig. Etc). The GM is unique 
for each of these networks, so A2 must be zero at T = 0. 
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FIG. 6: (Colour online) Variation of (a) Q, heat capacity C, 
and A2 and (b) average community size with temperature for 
the hexagonal lattice. 



However, the approach to zero differs. For P^ n = 1, the 
transition is very sharp and A2 is close to zero immedi- 
ately below the transition, reflecting the uniqueness and 
strength of the community structure, and the smallness 
of the fluctuations about the input communities. For 
Pin < 0-5, A2 is significantly above zero for temperatures 
well below T c , because in these more random graphs there 
are different partitions with similarly high values of Q. 
Thus, A2 is able to detect this non-uniqueness and lack of 
robustness in the communities in an effectively random 
graph. 



IV. REGULAR LATTICE 

We study a regular lattice next because it shows 
as a false positive in community detection algorithms, 
i.e. there are partitions with high values of Q but the 
network has no real community structure [^J HD] • We 
apply the method to a regular hexagonal lattice with 225 
nodes, each with degree k = 6, using periodic boundary 
conditions (Fig. EJ). 

The GM that we obtain has high Q, and is shown 
in Fig. [51 In the GM, the network is partitioned into 
hexagons that are close to regular, because this struc- 
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FIG. 7: Frequency matrices at different temperatures for the 
hexagonal lattice. 
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FIG. 8: (Colour online) Distributions of Q at different tem- 
peratures for the hexagonal lattice. 




FIG. 9: (Colour online) Best community structure found for 
an Apollonian network, having Q = 0.5030. The initial con- 
figuration for generating the Apollonian packing is three disks 
all touching each other. Then in the interstice between them, 
the largest possible disk is inserted, creating three new inter- 
stices. These interstices can in turn be filled. This process 
can be iterated until the whole of space is completely filled 
by disks. In the Apollonian network the disks correspond to 
the nodes, and edges occur wherever two disks are in contact. 
In the packing illustrated, disk insertion was stopped after 
five iterations, giving a finite network with 124 nodes. The 
communities are represented in (a) by different shades and in 
(b) the communities have been separated for clarity. 



multaneously throughout the network. 



V. APOLLONIAN PACKING 



ture minimizes the number of edges between communi- 
ties. As the GM is degenerate — translating the bound- 
aries of these communities gives a new partition with the 
same high value of Q — A2 does not tend towards zero at 
low temperature (Fig. EJ). Similarly, the frequency ma- 
trix at low temperature is not block diagonal because of 
the many different high Q partitions (Fig.Q), but neither 
is it entirely homogeneous, because nodes that are close 
together in space are more likely to be assigned to the 
same community. 

Despite these features, the transition is relatively 
sharp, and Q remains large until close to the transition. 
Unlike the model networks with higher P^ n considered 
in the last section, partitions with intermediate values 
of Q can easily be generated by decreasing the size of 
the spatially-localized communities. Reflecting this dif- 
ference, the distribution of Q is unimodal at all temper- 
atures (Fig. (HJ). At T c , the distribution is very broad, 
sampling lots of partitions with a wide range of Q. 

Similarly, there are no jumps in the temperature de- 
pendence of the average community size as in Fig. Efd), 
but instead it decreases relatively smoothly with temper- 
ature, with the two heat capacity peaks corresponding to 
the most rapid decreases in the community size (Fig.[5J). 
The largest heat capacity peak is sharp, probably reflect- 
ing the uniformity of the network, which makes it more 
likely that changes to the community structure occur si- 



The network of contacts between disks in an Apollo- 
nian packing is an exam ple of a scale-free, hierarchical 
and spatial network |45|, E3. The packing is a fractal 
object [56j |. and space is filled with different-sized disks, 
as shown in Fig. [31 Similar to the regular lattice, this 
Apollonian network has community structure with high 
Q (Fig. [HI shows the GM) due to its spatial nature, and 
the resulting communities are spatially localized. How- 
ever, Qmax is not as large as for the regular lattice of the 
previous section, probably because scale- free networks 
are inherently less separable because high-degree nodes 
connect up many parts of the network. 

Fig. ITU1 shows that the thermally-induced transition is 
much broader and less cooperative for the Apollonian 
network than for the test networks and the regular lat- 
tice. In those networks the degree distribution is very 
narrow, each node has a similar degree, and the commu- 
nities also have similar sizes. On heating, each node is 
similarly likely to break away from its community at the 
same temperature, leading to a sharp transition. In a 
scale-free network, where the nodes have very different 
degrees, a low-degree node may be more likely to break 
away from its community and start a new one on heating, 
whereas a high-degree node is likely to have more connec- 
tions to its community and so starting a new community 
would involve a large decrease in Q. On the other hand, 
a high-degree node may be more likely to switch to a 
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FIG. 10: (Colour online) Variation of Q, heat capacity C, and 
A2 with temperature for the Apollonian network. 
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FIG. 11: Frequency matrices at different temperatures for 
the Apollonian packing. 



different community, as it probably also has a significant 
number of connections to other communities, whereas a 
low-degree node could easily have all of its connections 
within one community and so strongly prefer that com- 
munity. 

A2 (Fig. EH) is much closer to zero at low temperatures 
than for the lattice, reflecting the more block diagonal 
frequency matrix (Fig. [HJ, but it does not not reach 
zero, because there is some degeneracy due to the three- 
fold symmetry of the packing. The low-temperature 
frequency matrix reflects the hierarchical nature of the 
modularity, i.e. there are competing 'length scales' for 
the community structure. The chequerboard pattern on 
the diagonal is due to competing partitions that have 
only marginally different Q values, the difference being 
whether to group or break up certain communities. There 
are some thin strips of colour corresponding to nodes that 
can belong to any of a range of communities, providing 
some overlap between them. These are the hubs in the 
network, namely the large central disk and the three large 
peripheral disks (only partially shown in Fig.[5J). 
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FIG. 12: (Colour online) Variation of (a) Q, (b) heat ca- 
pacity C, and (c) A2 with temperature for the scale-free test 
networks. The lines are labelled by the target values of Pi n 
used in the network generation. 



VI. SCALE-FREE TEST NETWORKS 

Next, we examine randomized versions [H^ of the 
Apollonian network of Section with community struc- 
ture of varying strength introduced. Each node is as- 
signed to one of four communities with equal probability, 



independent of its degree, i.e. hubs were neither prefer- 
entially assigned to the same community nor to different 
communities. Edges were then rewired [H^, while 
maintaining the degree distribution, and with a predeter- 
mined preference for edges to lie within the communities, 
thus enabling differing values of P^ n to be generated. Due 



9 









(J.l - 




0.01 - 






o 


lO" 3 : 


.2 




*^ 


10" 4 - 


Lstril 


10" 5 - 


Q 


10" 6 






io- 7 




T=Tc=2. 10x10 
<G> = 0.2220 



0.1 0.1 0.2/O0.3 0.4 0.5 0.6 



o 
.2 




05 0.05 0.1q0.15 0.2 0.25 0.3 0.35 



FIG. 13: (Colour online) Distributions of Q at different tem- 
peratures for the scale-free test networks with (a)Pj n = 0.9126 
and (b)Pm = 0.2514. 



to the stochastic nature by which the community struc- 
ture was introduced, the values of P^ n of 0.2514, 0.5000, 
0.7486 and 0.9126 we obtained did not necessarily pre- 
cisely match the target values. Because the networks are 
scale-free and the nodes are assigned to communities at 
random, P^ n =0.9126 is the highest value that we could 
obtain. These networks have exactly the same degree dis- 
tribution as the Apollonian network, so it is possible to 
infer both the effect of the scale-free degree distribution, 
by comparison with the standard test network of Section 
IIIII and which features of the Apollonian network are due 
to its spatial nature and organization. 

In comparison with the standard test networks, the 
main difference is that the transitions are much broader 
(Fig. IT2|) . because the greater heterogeneity inherent to 
scale- free networks leads to a wider range of temperatures 
at which it becomes favourable for nodes to break away 
from their assigned communities. There is less difference 
between the P^ n = 0.2514 and P^ n = 0.9126 networks in 
the position and height of the heat capacity peak than 
for the standard test networks. 

Like the standard test networks, there is still a clear 
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FIG. 14: Frequency matrices at different temperatures for 
scale-free test networks with (a) Pi n = 0.9126, (b) Pi n = 
0.2514. 



difference between P in < 0.5 and P^ n > 0.5. For example 
Qmax does not correspond to the input community struc- 
ture for Pi n < 0.5. Furthermore, the frequency matrices 
at Pi n « 0.25 show a very similar temperature evolution. 
In particular, there is again evidence that at low tem- 
perature there are sets of nodes that can nearly equally 
well belong to a number of communities. However, there 
is less indication of this lack of uniqueness in the low- 
temperature form for A2 than for the standard networks. 

The Apollonian network has a very similar Q ma x to 
the test network with P^ n w 0.75. Aside from the hier- 
archical aspects to the frequency matrices for the Apol- 
lonian packing, the properties of the two networks are 
extremely similar, showing that the degree distribution 
is the dominant determinant of the behavior. Even for 
Pin close to one a bimodal distribution of Q values is not 
seen (Fig. [EH ? unlike the corresponding test network in 
Section ITTT1 The distributions are much broader around 
T c when stronger community structure is present. Q is 
sensitive to small changes in temperature, leading to a 
sharp transition, but rather than switching between two 
distinct phases, this change is continuous. 



VII. METABOLIC NETWORK 

We now apply this approach to a metabolic network 
describing pathways between metabolites that has been 
found to have a scale- free topology 47] . In this network, 
nodes represent metabolites and edges connect those that 
are interconverted by a chemical reaction in E. coli Each 
chemical reaction can be classified as belonging to a par- 
ticular metabolic pathway that performs a certain func- 
tion, e.g. carbohydrate metabolism. We first reduced the 
size of the network from 896 to 304 nodes, by recursively 
removing all k = 1 nodes by assuming that they belong 
to the same community as the one node to which they are 
connected. The metabolic network has a very high Q ma x, 
as noted previously 0, [HI El E3 •> indicating that it is a 
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FIG. 15: (Colour online) Variation of Q, heat capacity C, and 
A2 with temperature for the metabolic network. 



highly modular network, where the different modules are 
responsible for different functions. 

Similar to the other scale-free networks, the transition 
is fairly broad (Fig. lT5)l . The breadth reflects the broader 
degree distribution, competing length scales, and hetero- 
geneities in the community structure. At intermediate 
temperatures, it is clear that some of the smaller com- 
munities break up and merge with other communities, 
while others remain intact (Fig. H6j) . The communities 
that break up only integrate with a limited number of 
other nodes, leading to the weak blockiness exemplified 
in Fig. El In the networks studied so far, the commu- 
nities have all been fairly similar, whereas in this exam- 
ple some communities are stronger than others, and as 
such more difficult to break up, thus contributing to the 
broadness of the transition. 

Even though some hierarchy can be seen in the fre- 
quency matrix, A2 smoothly tends towards zero at low 
temperatures, meaning one partition dominates. This 
result implies that the uncovered community structure is 
unique and therefore significant. Although a few of the 
low temperature communities involve metabolites associ- 
ated with different pathways, most communities tend to 
correspond to a single pathway, as illustrated in Fig. 
The stronger communities are subgroups of those cor- 
responding to nucleotide and carbohydrate metabolism, 
and to a lesser extent amino acid metabolism. 



■Carbohydrate metabolism 

■Lipid metabolism 

I Nucleotide metabolism 

■Amino acid metabolism 

■Metabolism of cofactors and vitamins 

■Glycan biosynthesis and metabolism 

■Xenobiotics biodegradation and metabolism 
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FIG. 16: (Colour online) Frequency matrices at different tem- 
peratures for the metabolic network. If two metabolites are 
in the same metabolic pathway (see key), the corresponding 
square is given the colour assigned to that pathway, other- 
wise the square is coloured black. In the case that the pair 
of metabolites have more than one pathway in common, the 
colour of the square is chosen according to that of its neigh- 
bours. In the rare case that the colour remains ambiguous, 
one of the relevant colours is chosen at random. 



VIII. CONCLUSIONS 

To assess the significance and nature of the commu- 
nity structure obtained by algorithms that optimize the 
modularity, we have studied how canonical ensembles of 
network partitions depend on temperature, where — Q 
plays the role of energy. Typically, there is a transition 
from low entropy, high Q partitions to high entropy, low 
Q essentially random partitions as the temperature is in- 
creased. The heat capacity provides a useful probe of 



this transition. If there is strong community structure, 
the transition is sharp. The peak is broader for networks 
with weaker community structure, as there are more rea- 
sonable alternative partitions with intermediate values of 
<2, and so the transition occurs over a broader range of 
temperature. If a network has a scale-free or broader de- 
gree distribution, the transition also tends to be broader, 
because the network is more heterogeneous and hubs are 
likely to make it difficult to separate a network into dif- 
ferent modules. 

We have introduced an order parameter A2 to quantify 
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the uniqueness of the community structure, i.e. whether 
there is just a single partition with high Q or a number of 
competing partitions. A2 is therefore useful as a tool to 
detect false positives, i.e. partitions whose have high val- 
ues of Q do not reflect significant community structure. 
In the case of a lattice, the partition with the highest Q 
value is highly degenerate and therefore not significant. 
Consistent with this, A2 at T = is non-zero. For the 
random networks, A2 is zero at the lowest temperatures 
because the GM is non-degenerate. However, below T c 
A2 takes significant values (around 0.2-0.3), reflecting the 
many competing partitions that detract from the signif- 
icance of the GM. 

Frequency matrices, in particular their temperature 
dependence, provide the most information. For example, 
it is possible to determine whether the topology is based 
around a single community structure with small fluctua- 
tions, implying that the network is composed of weakly 
interacting modules, as in the test networks with high 
Pi n . Furthermore, more complex modularity features can 
also be visualised. For example, in the Apollonian pack- 



ing, the hierarchical nature is clear form the chequer- 
board pattern along the diagonal, and nodes with asso- 
ciations to more than one community are revealed, re- 
flecting overlapping community structures. The temper- 
ature evolution of the frequency matrix for the metabolic 
network is particularly interesting, as heterogeneities in 
the community structure are apparent, with the stronger 
communities persisting to higher temperature. 

In summary, the results in this paper highlight the 
importance of studying more than simply the 'best' par- 
tition into communities, and the methods introduced in 
this paper provide a rigorous approach for doing so. 
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